32 research outputs found

    Image Retrieval using Textual Cues

    Get PDF
    International audienceWe present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce

    CVIT-MT Systems for WAT-2018

    Get PDF
    This document describes the machine translation system used in the submissions of IIIT-Hyderabad CVIT-MT for the WAT-2018 English-Hindi translation task. Performance is evaluated on the associated corpus provided by the organizers. We experimented with convolutional sequence to sequence architectures. We also train with additional data obtained through backtranslation

    Scene Text Recognition using Higher Order Language Priors

    Get PDF
    International audienceThe problem of recognizing text in images taken in the wild has gained significant attention from the computer vision community in recent years. Contrary to recognition of printed documents, recognizing scene text is a challenging problem. We focus on the problem of recognizing text extracted from natural scene images and the web. Significant attempts have been made to address this problem in the recent past. However, many of these works benefit from the availability of strong context, which naturally limits their applicability. In this work we present a framework that uses a higher order prior computed from an English dictionary to recognize a word, which may or may not be a part of the dictionary. We show experimental results on publicly available datasets. Furthermore, we introduce a large challenging word dataset with five thousand words to evaluate various steps of our method exhaustively. The main contributions of this work are: (1) We present a framework, which incorporates higher order statistical language models to recognize words in an unconstrained manner (i.e. we overcome the need for restricted word lists, and instead use an English dictionary to compute the priors). (2) We achieve significant improvement (more than 20%) in word recognition accuracies without using a restricted word list. (3) We introduce a large word recognition dataset (atleast 5 times larger than other public datasets) with character level annotation and benchmark it

    An MRF Model for Binarization of Natural Scene Text

    Get PDF
    International audienceInspired by the success of MRF models for solving object segmentation problems, we formulate the binarization problem in this framework. We represent the pixels in a document image as random variables in an MRF, and introduce a new energy (or cost) function on these variables. Each variable takes a foreground or background label, and the quality of the binarization (or labelling) is determined by the value of the energy function. We minimize the energy function, i.e. find the optimal binarization, using an iterative graph cut scheme. Our model is robust to variations in foreground and background colours as we use a Gaussian Mixture Model in the energy function. In addition, our algorithm is efficient to compute, and adapts to a variety of document images. We show results on word images from the challenging ICDAR 2003 dataset, and compare our performance with previously reported methods. Our approach shows significant improvement in pixel level accuracy as well as OCR accuracy

    Scene Text Recognition and Retrieval for Large Lexicons

    Get PDF
    International audienceIn this paper we propose a framework for recognition and retrieval tasks in the context of scene text images. In contrast to many of the recent works, we focus on the case where an image-specific list of words, known as the small lexicon setting, is unavailable. We present a conditional random field model defined on potential character locations and the interactions between them. Observing that the interaction potentials computed in the large lexicon setting are less effective than in the case of a small lexicon, we propose an iterative method, which alternates between finding the most likely solution and refining the interaction po-tentials. We evaluate our method on public datasets and show that it improves over baseline and state-of-the-art approaches. For example, we obtain nearly 15% improvement in recognition accuracy and precision for our retrieval task over baseline methods on the IIIT-5K word dataset, with a large lexicon containing 0.5 million words

    All-sky search for long-duration gravitational wave transients with initial LIGO

    Get PDF
    We present the results of a search for long-duration gravitational wave transients in two sets of data collected by the LIGO Hanford and LIGO Livingston detectors between November 5, 2005 and September 30, 2007, and July 7, 2009 and October 20, 2010, with a total observational time of 283.0 days and 132.9 days, respectively. The search targets gravitational wave transients of duration 10-500 s in a frequency band of 40-1000 Hz, with minimal assumptions about the signal waveform, polarization, source direction, or time of occurrence. All candidate triggers were consistent with the expected background; as a result we set 90% confidence upper limits on the rate of long-duration gravitational wave transients for different types of gravitational wave signals. For signals from black hole accretion disk instabilities, we set upper limits on the source rate density between 3.4×10-5 and 9.4×10-4 Mpc-3 yr-1 at 90% confidence. These are the first results from an all-sky search for unmodeled long-duration transient gravitational waves. © 2016 American Physical Society

    All-sky search for long-duration gravitational wave transients with initial LIGO

    Get PDF
    We present the results of a search for long-duration gravitational wave transients in two sets of data collected by the LIGO Hanford and LIGO Livingston detectors between November 5, 2005 and September 30, 2007, and July 7, 2009 and October 20, 2010, with a total observational time of 283.0 days and 132.9 days, respectively. The search targets gravitational wave transients of duration 10-500 s in a frequency band of 40-1000 Hz, with minimal assumptions about the signal waveform, polarization, source direction, or time of occurrence. All candidate triggers were consistent with the expected background; as a result we set 90% confidence upper limits on the rate of long-duration gravitational wave transients for different types of gravitational wave signals. For signals from black hole accretion disk instabilities, we set upper limits on the source rate density between 3.4×10-5 and 9.4×10-4 Mpc-3 yr-1 at 90% confidence. These are the first results from an all-sky search for unmodeled long-duration transient gravitational waves. © 2016 American Physical Society

    Search for Tensor, Vector, and Scalar Polarizations in the Stochastic Gravitational-Wave Background

    Get PDF
    The detection of gravitational waves with Advanced LIGO and Advanced Virgo has enabled novel tests of general relativity, including direct study of the polarization of gravitational waves. While general relativity allows for only two tensor gravitational-wave polarizations, general metric theories can additionally predict two vector and two scalar polarizations. The polarization of gravitational waves is encoded in the spectral shape of the stochastic gravitational-wave background, formed by the superposition of cosmological and individually unresolved astrophysical sources. Using data recorded by Advanced LIGO during its first observing run, we search for a stochastic background of generically polarized gravitational waves. We find no evidence for a background of any polarization, and place the first direct bounds on the contributions of vector and scalar polarizations to the stochastic background. Under log-uniform priors for the energy in each polarization, we limit the energy densities of tensor, vector, and scalar modes at 95% credibility to Ω0T<5.58×10-8, Ω0V<6.35×10-8, and Ω0S<1.08×10-7 at a reference frequency f0=25 Hz. © 2018 American Physical Society

    Learning support order for manipulation in clutter

    No full text
    IEEE Robotics and Automation Society (RAS);IEEE Industrial Electronics Society (IES);The Robotics Society of Japan (RSJ);The Society of Instrument and Control Engineers (SICE);New Technology Foundation (NTF)2013 26th IEEE/RSJ International Conference on Intelligent Robots and Systems: New Horizon, IROS 2013 -- 3 November 2013 through 8 November 2013 -- Tokyo -- 102443Understanding positional semantics of the environment plays an important role in manipulating an object in clutter. The interaction with surrounding objects in the environment must be considered in order to perform the task without causing the objects fall or get damaged. In this paper, we learn the semantics in terms of support relationship among different objects in a cluttered environment by utilizing various photometric and geometric properties of the scene. To manipulate an object of interest, we use the inferred support relationship to derive a sequence in which its surrounding objects should be removed while causing minimal damage to the environment. We believe, this work can push the boundary of robotic applications in grasping, object manipulation and picking-from-bin, towards objects of generic shape and size and scenarios with physical contact and overlap. We have created an RGBD dataset that consists of various objects used in day-to-day life present in clutter. We explore many different settings involving different kind of object-object interaction. We successfully learn support relationships and predict support order in these settings. © 2013 IEEE
    corecore